Computation of compression parameters via OpenVINO models #2727

nikita-savelyevv · 2024-06-11T13:45:12Z

Changes

Implemented OpenVINO model graphs which are used for calculation of compressed and decompressed weights. Since these models are compiled, calculation become significantly faster especially for larger models and int4 compression.
This functionality is exposed by two methods at weight_lowering.py:
- do_int_quantization() is used for computing a compressed weight. Possible signatures:
  - weight -> compressed_weight, scale, (zero_point for asymmetric compression)
  - weight, scale, (zero_point) -> compressed_weight, scale, (zero_point)
- calculate_quantized_dequantized_weight() is used for computing a decompressed weight. Possible signatures:
  - weight -> decompressed_weight
  - weight, scale, (zero_point) -> decompressed_weight
  - weight -> decompressed_weight, compressed_weight, scale, (zero_point)
  - weight, scale, (zero_point) -> decompressed_weight, compressed_weight, scale, (zero_point)
- Output scale and zero_point are the same as the ones given as input (if they were given at all).
- Computation is done via OV models only if openvino package is installed and input tensors are not torch tensors.
Introduce a new NNCF Tensor backend for storing instances of openvino.Tensor. Implementation for this backend is limited by only the required functionality, e.g. addition of OV Tensors is not supported because it is not needed.
- Introduction of OV Tensors is required for seamless handling of tensors in bf16, u4 and i4 data types. For example, bf16 constants are read from an OpenVINO LLM and given as inputs to a compressing OpenVINO model. u4 and i4 compressed weights are seamlessly inserted into the resulting compressed OpenVINO model.
- Added as_numpy_tensor() method to convert an NNCF Tensor to numpy backend. Currently only OV -> NP conversion is required.
All calculations are aligned with reference numpy implementation. Some performance and memory sacrifices had to be made for such alignment.

Data-free asymmetric compression:

Data-free symmetric compression:

Data-aware compression:

Reason for changes

Reducing model compression time. Only OpenVINO model compression backend is affected.

Related tickets

139047

Tests

tests/openvino/native/quantization/test_ov_modeling_compression.py::test_quantization_alignment -- check aligment with reference numpy implementation
tests/openvino/native/test_openvino_modeling.py -- checks OV modeling framework hyperparameters
tests/openvino/native/test_tensor.py -- NNCF OV Tensor backend tests

Validation jobs:

NNCF/job/manual/job/post_training_weight_compression/299/
NNCF/job/nightly/job/test_examples/650
OVVP validation ✅
optimum-intel test job https://github.com/huggingface/optimum-intel/actions/runs/12912964434/job/36009036879?pr=734

nncf/quantization/algorithms/weight_compression/weight_lowering.py

nncf/openvino/quantization/compression_primitives.py

nncf/quantization/fake_quantize.py

nncf/quantization/algorithms/weight_compression/weight_lowering/dispatcher.py

nncf/common/utils/decorators.py

nncf/openvino/graph/node_utils.py

nncf/quantization/algorithms/weight_compression/weight_lowering.py

nncf/tensor/functions/numeric.py

nncf/tensor/functions/ov_numeric.py

nncf/torch/quantization/algo.py

nncf/tensor/functions/ov_numeric.py

nncf/quantization/algorithms/weight_compression/openvino_backend.py

nncf/tensor/functions/numeric.py

nncf/common/logging/logger.py

nncf/common/utils/caching.py

nncf/common/logging/logger.py

nncf/common/utils/caching.py

This reverts commit 629705c46fbbdff81b8c3d0fed2299dbc5576603.

alexsu52

LGTM

Please address minor comments from the last round of review.
The comment Computation of compression parameters via OpenVINO models #2727 (comment) will be addressed int the follow-up PR.

nncf/common/utils/caching.py

nncf/quantization/algorithms/weight_compression/weight_lowering.py

nncf/quantization/algorithms/weight_compression/openvino_modeling.py

### Changes Follow up to #2727 1. Do not use `infer_request.results` 2. Replace `>=` with `opset.greater_equal()` 3. Rename `ov_numeric.py` to `openvino_numeric.py` ### Reason for changes 1. Improve int4 compression time by up to ~10% 2. Avoid warning: `DeprecationWarning: greater_equal is deprecated and will be removed in version 2025.3. Use ops.greater_equal instead` 3. Fix onnx install test ### Related tickets 139047 ### Tests - https://github.com/openvinotoolkit/nncf/actions/runs/12947249537 - NNCF/job/manual/job/post_training_weight_compression/301/ - NNCF/job/nightly/job/test_examples/653/

github-actions bot added NNCF Common NNCF OpenVINO NNCF PTQ labels Jun 11, 2024

alexsu52 reviewed Jun 13, 2024

View reviewed changes

nncf/quantization/algorithms/weight_compression/weight_lowering.py Outdated Show resolved Hide resolved

nncf/quantization/algorithms/weight_compression/weight_lowering.py Outdated Show resolved Hide resolved

nncf/openvino/quantization/compression_primitives.py Outdated Show resolved Hide resolved

nikita-savelyevv force-pushed the compress-via-openvino branch 4 times, most recently from 55cafaa to a68a63d Compare July 3, 2024 18:31

nikita-savelyevv force-pushed the compress-via-openvino branch 4 times, most recently from 6b98ddd to 3d9faa4 Compare July 16, 2024 14:19

nikita-savelyevv force-pushed the compress-via-openvino branch 6 times, most recently from 1c85732 to b527cac Compare September 6, 2024 11:11

github-actions bot added the documentation label Sep 6, 2024

nikita-savelyevv force-pushed the compress-via-openvino branch 2 times, most recently from ac3ea02 to 2a3a63c Compare September 11, 2024 12:59

nikita-savelyevv force-pushed the compress-via-openvino branch from c9569bb to a151d99 Compare October 11, 2024 11:51

nikita-savelyevv force-pushed the compress-via-openvino branch 2 times, most recently from fe30c13 to 19ea412 Compare October 21, 2024 08:52

alexsu52 requested a review from AlexanderDokuchaev October 22, 2024 09:32

alexsu52 reviewed Oct 22, 2024

View reviewed changes

nikita-savelyevv force-pushed the compress-via-openvino branch 3 times, most recently from eef34f8 to ca3447c Compare October 26, 2024 13:40

nikita-savelyevv force-pushed the compress-via-openvino branch from ca3447c to f3891cd Compare October 29, 2024 15:19

nikita-savelyevv added 2 commits January 16, 2025 11:18

Remove Optional type hint

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode

0698c17

Remove DuplicateFilter

1d5a7d7

github-actions bot added the NNCF PT label Jan 16, 2025

AlexanderDokuchaev reviewed Jan 16, 2025

View reviewed changes

alexsu52 reviewed Jan 17, 2025

View reviewed changes

nncf/tensor/functions/numeric.py Outdated Show resolved Hide resolved

nncf/tensor/functions/ov_numeric.py Outdated Show resolved Hide resolved

nncf/torch/quantization/algo.py Outdated Show resolved Hide resolved

kshpv mentioned this pull request Jan 17, 2025

[Torch][WeightCompression] Add Scale Estimation data-aware support #3179

Open

nikita-savelyevv added 4 commits January 20, 2025 11:23

Move ov models to separate module

990fd72

Introduce NNCFLogger

2054e46

Addressed other comments

07e3060

Fixed caching test

9cc933a

alexsu52 reviewed Jan 21, 2025

View reviewed changes

nncf/tensor/functions/ov_numeric.py Outdated Show resolved Hide resolved

nncf/tensor/functions/ov_numeric.py Outdated Show resolved Hide resolved

nncf/quantization/algorithms/weight_compression/openvino_backend.py Outdated Show resolved Hide resolved

alexsu52 reviewed Jan 21, 2025

View reviewed changes

nncf/tensor/functions/numeric.py Outdated Show resolved Hide resolved

Introduced suggested changes

f1dc6ac

AlexanderDokuchaev requested changes Jan 21, 2025

View reviewed changes

nncf/common/logging/logger.py Outdated Show resolved Hide resolved

nncf/common/utils/caching.py Outdated Show resolved Hide resolved

nncf/common/logging/logger.py Show resolved Hide resolved

nncf/common/utils/caching.py Outdated Show resolved Hide resolved

nikita-savelyevv added 5 commits January 22, 2025 15:56

Implement requested changes

c384df9

Add debug conditions

2a13717

Revert "Add debug conditions"

92a3d9a

This reverts commit 629705c46fbbdff81b8c3d0fed2299dbc5576603.

Merge branch 'develop' into compress-via-openvino

74d1d74

Fix tests

06c3447

nikita-savelyevv requested review from alexsu52 and AlexanderDokuchaev January 22, 2025 16:57

Using < is being deprecated

841b807

alexsu52 reviewed Jan 23, 2025

View reviewed changes

nncf/common/utils/caching.py Outdated Show resolved Hide resolved

nncf/quantization/algorithms/weight_compression/weight_lowering.py Outdated Show resolved Hide resolved

nncf/quantization/algorithms/weight_compression/openvino_modeling.py Outdated Show resolved Hide resolved

Address minor comments

9a533d0

AlexanderDokuchaev approved these changes Jan 23, 2025

View reviewed changes

Add todo

68eef07

alexsu52 approved these changes Jan 23, 2025

View reviewed changes

alexsu52 merged commit f3f232f into openvinotoolkit:develop Jan 23, 2025
17 checks passed

nikita-savelyevv mentioned this pull request Jan 23, 2025

Follow up to #2727 #3211

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Computation of compression parameters via OpenVINO models #2727

Computation of compression parameters via OpenVINO models #2727

nikita-savelyevv commented Jun 11, 2024 •

edited

Loading

alexsu52 left a comment

Computation of compression parameters via OpenVINO models #2727

Computation of compression parameters via OpenVINO models #2727

Conversation

nikita-savelyevv commented Jun 11, 2024 • edited Loading

Changes

Reason for changes

Related tickets

Tests

alexsu52 left a comment

Choose a reason for hiding this comment

nikita-savelyevv commented Jun 11, 2024 •

edited

Loading